Gene expression analysis of M. tuberculosis in patients with and without HIV co-infection

Group 3: Anne Skov-Johannessen s184330, Dea F. Skipper, Helene B. L. Petersen s194699, Johanne B. Overgaard s194691 and Rebecca C. Grenov

Introduction

The leading cause of death in HIV-infected individuals.

  • Weakened immune system

  • Limits sensitivity of diagnosis of TB

Support vector machine to find 251-gene signature

  • Genes involved in Immunological, Infectious and Inflammatory Disease

  • Limits sensitivity of diagnosis of TB

Our aim:

  • Explore genes with a significant expression enriched in HIV with TB co-infection

  • Compare with the 251-gene signature found with the SVM model

Method

Keep it clean and tidy:

  • Select variables

  • Mutate variables

  • Handle key-variable

  • Handle replications

Methods

Methods

Normalization - minimize technical variability

Log transformation - stabilize variance, reduce skewness

Quantile Normalization:

  1. Sort the the expression levels for each patient.
  2. Calculate the mean expression level of the genes within the same rank.
  3. Assign this mean to each gene within this rank.
  4. Rearranging the genes for each patient to obtain the original order.

Methods

Results - PCA

Variance explained by the principal components

  • First PC explains 15% of the variance

  • 31 PCs needed to explain 90% of variance

  • Difficult to compress 47.000 onto few PCs

Results - PCA

Scatter plot of projected observations onto PC1 and PC2

  • Slight division of disease state on PC1

  • No clear division of gender

  • Need for further analysis of disease state

Results - Linear Regression

Forest plot

  • Most significant genes are down regulated

Results - Linear Regression

Volcano plot

  • None of the significant genes are among the Tuberculosis signature